Label-setting methods for Multimode Stochastic Shortest Path 

problems on graphs. 
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Stochastic shortest path (SSP) problems arise in a variety of discrete stochastic control contexts. An optimal 
solution to such a problem is typically computed using the value function, which can be found by solving the 
corresponding dynamic programming equations. In the deterministic case, these equations can be often solved 
by the highly efficient label-setting methods (such as Dijkstra's and Dial's algorithms). In this paper we define 
and study a class of Multimode Stochastic Shortest Path problems and develop sufficient conditions for the 
applicability of label-setting methods. We illustrate our approach on a number of discrete stochastic control 
examples. We also discuss the relationship of SSPs with discretizations of static Hamilton- Jacobi equations and 
provide an alternative derivation for several fast (non-iterative) numerical methods for these PDEs. 
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\ OR/MS subject classification: Primary: dynamic programming; Secondary: Markov finite state 

u, 

■ 1. Introduction. Stochastic Shortest Path (SSP) problems constitute a large class of Markov Deci- 

(-H \ sion Processes and their accurate and efficient solution is important for numerous applications including 

mathematical finance, optimal resource allocation, design of discrete-time risk-sensitive controls, and 
controlled queuing in communication networks. Our goal in this paper is to study the conditions under 
vi^hich an important subclass of SSPs can be solved by efficient label-setting methods. 



In SSP the current state of the system at the fc— th stage is y^,, an element in a finite state space 
^ , X = {a;i, xm: t = xm+i}- At the next stage, Dk^i is a random variable, whose probability distribution 

\jr^ ' on X depends on t/j, and on the decision made (control value chosen) at the previous stage. The process 

, terminates upon reaching a special target state t. At each stage, our choice of control determines the 

incurred cost and the overall goal is to minimize the "value function" (i.e., the expected value of the 
total accumulated cost up to the termination). We provide a formal description of SSP in section [21 



' here we simply note that the dynamic programming approach yields a system of M coupled nonlinear 



equations for the value function. Under mild technical assumptions this system has a solution, which can 



be found by "value iteration" . However, since these iterations are performed in R , this can be quite 
costly, especially considering the fact that infinitely many steps are generally needed for convergence. 

On the other hand, SSPs can be considered as a generalization of classical deterministic shortest path 
. , (SP) problems on directed graphs, for which there is a variety of well-understood efficient algorithms. In 

' particular, non-iterative label-setting methods are applicable provided the transition-costs in the graph 

are non-negative. If a constant k << M is an upper bound on outdegrees, Dijkstra's method [H] and 
Dial's method [TT] solve the deterministic dynamic programming equations in 0(M log M) and 0{M) 
operations respectively. We provide a brief overview of these methods in section 12. 1[ but here we simply 
note that both methods hinge on the absolute causality present in a deterministic problem: the fact that 
the value function is decreasing along every optimal path to t. 

Thus, to build similar methods for SSPs, one needs to find similar causal properties in the stochastic 
problem. In fact, Bertsekas showed that a Dijkstra-like method will correctly compute the value function 
of an SSP if there exists a consistently improving optimal policy [H Vol. II, p. 98]. In Section we 
define a similar notion of a consistently S-improving optimal policy, which guarantees the applicability of 
a Dial-like method. Unfortunately, both of these criteria arc implicit since the existence of such optimal 
policies is generally not known a priori. 

The main contribution of this paper is the development of explicit conditions on transition cost func- 
tion(s), which guarantee the existence of consistently improving and/or consistently (5-improving optimal 
policies for a large class of Multimode SSPs. The exact class of SSPs that we consider is formally defined 
in Section [31 but generally our criteria apply provided 
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(i) each state x € X has a cohection of "modes" mi (a;), . . . , mr{x) - (possibly overlapping) subsets 
of X; 

(ii) each individual control is restricted to one of the modes (i.e., has non-zero transition probabilities 
only into the states available in that mode); 

(iii) there exists an available control corresponding to each possible probability distribution over the 
states in each mode; 

(iv) the control-cost is defined for each mode separately as a continuous function of the corresponding 
probability distribution over the states in that mode. 

In this setting, it is natural to interpret the decision made at each stage as a deterministic choice among 
the modes of y^. plus the choice of a desirable probability distribution for the transition to one of the 
states in that mode. 

This class obviously includes the classical SP problem (when each mode contains only one possible 
successor-state). More interestingly, it includes the problem of selecting optimal randomized/mixed con- 
trols for deterministic shortest path problems, when such randomized/mixed controls might be available 
at a discount. In section 13.11 we consider several representative examples and discuss the differences 
between explicitly causal problems (where the causality stems from a particularly simple structure of 
transition probabilities) and absolutely causal problems (where the applicability of label-setting methods 
stems from certain properties of transition costs, as derived in section [3. 3p . 

The Multimode SSPs also include (but are not limited to) the Markov chain approximations of de- 
terministic continuous optimal trajectory problems. (E.g., consider a vehicle starting at some point x 
inside the domain C i?", which is controlled to minimize the time needed to reach the boundary 
dfl.) The value function for such problems is typically found as a viscosity solution of a static first-order 
Hamilton- Jacobi PDE. It is well-known that semi-Lagrangian discretizations of that PDE (similar to those 
in [l3j and [l^) can also be obtained from controlled Markov processes on the underlying grids. This 
approach was pioneered by Kushner and Dupuis |18| to design approximation schemes for deterministic 
and stochastic continuously-controlled processes. Recent extensions include higher-order approximations 
[5T| and methods for stochastic differential games [T7] . The resulting systems of equations are typically 
treated iteratively, but relatively recently provably convergent label-setting algorithms were introduced 
for several important subclasses. For the isotropic case (when the vehicle's speed depends only on its 
current position in Q and is independent of the chosen direction of motion) , the corresponding PDE is 
Eikonal. In 1994 Tsitsiklis introduced the first Dijkstra-like and Dial-like methods for semi-Lagrangian 
discretizations of this PDE on a uniform Cartesian grid The family of Dijkstra-like Fast Marching 

Methods, introduced by Sethian in '23] and extended by Sethian and co-authors in [25l [161 [26] , was devel- 
oped for Eulcrian upwind discretizations of the Eikonal PDE in the context of isotropic front propagation 
problems. A detailed discussion of similarities and differences of these approaches can be found in [25]. 
More recently, another Dial- like method for the Eikonal PDE on a uniform grid was introduced in [T5] . 
For the anisotropic case, the resulting semi-Lagrangian discretization typically does not have that causal 
property and the label-setting methods are not directly applicable. The label-setting Ordered Upwind 
Methods jl71[25] circumvent this difficulty; the key idea behind them can be interpreted as "modifying the 
computational stencil on-the-fiy to ensure the causality" . In the appendix of [28| we also demonstrated 
that the causality is present for the first-order semi-Lagrangian discretizations of the Eikonal PDE on 
arbitrary acute meshes. 

In all of the above cases the proofs of causality heavily relied on a geometric interpretation of the 
problem (e.g., a discretization of a particular PDE on a specific grid or mesh in i?"). In contrast, we 
first demonstrate that the applicability of label-setting methods to Multimode SSPs can be proven even 
if no geometric interpretation is available (Section [3]). We then show that the absolute causality of 
prior numerical methods for the Eikonal PDE can be easily re-derived from the more general criteria 
introduced in here. In addition, our formulation yields two new results for deterministic continuous 
optimal trajectory problems (Section [1]): 

• a formula for the bucket-width in a Dial-like method for Eikonal PDEs on acute meshes; 

• an applicability criterion for the label-setting techniques in anisotropic optimal control problems. 
Finally, in Section [5] we discuss the limitations of our approach and list several related open problems. 
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2. Stochastic Shortest Path Problem. Typically SSP is described on a directed graph with nodes 
X = {a;i, ...,XMit = xm+i}- Our exposition here closely follows the standard setting described in [4]. 

For each Xi the problem specifies a compact set of allowable controls Ai — A{xi). If Xi is the current 
state of the process (i.e., if y^, = a;;), then our choice of a control value a ^ Ai determines the cost of the 
next transition C{xi, a) as well as the probability of transition into each node Xj : 

p{xi, Xj, a) — Pij{a) = P {Vk+i = Xj \ y^. — Xi, and the chosen control is a e Ai) . 

A class of problems where the transition cost C{xi, a, Xj) also depends on the resulting successor node Xj 

M+l 

can also be handled in the same framework by defining C{xi, a) = Pij{o-)C{xi, a, Xj). It is assumed 

that the cost is accumulated until we reach the absorbing target t, i.e., Pttio-) — 1 and C(t,a) — for 
Va e Af . 

/ M \ 

Consider the class of control mappings ^ : X i-^ U ] such that /i(.Ti) G Ai for all Xi £ X. A 

policy is an infinite sequence of such mappings tt — (/io,Mi: ■ • ■)• ^ stationary policy is a policy of the 
form (fi, /i, . . .) and for the sake of brevity we will also refer to it as "the stationary policy /i". 

If the process starts at x E X (i.e., = a;), the expected cost of using a policy tt (/io,/ii, . . .) is 
defined as 

J{x,7T) = ElY.C{y„fik{yk))] ■ 

The value function is then defined as usual Ui — U{xi) = inf J(xi, tt), and a policy tt* is called optimal 
provided U{xi) = J'{xi,TT*) for all Xi E X. 

If the value function U{xi) is finite, it should satisfy the dynamic programming equations: U{t) — 
and 

( M+l 1 

= jnf <j Cix,,a) + ^ P^Jia)U, j> , for Vcc, E X\{t}. (1) 



An operator T is defined on R component-wise by applying the right hand side of equation ([T]); i.e., 

M+l 



for any W E -R*^ 



Clearly, U = 



{TW)^ = inf <^ C{x,,a) + V p,j{a)Wj } . (2) 



Ua, 



is a fixed point of T and one hopes to recover U by value iteration: 



yyn+i ._ y starting from an initial guess W° E R^^ . (3) 

However, T generally is not a contraction unless all stationary policies are known to be proper [B]. 

In [S] Bertsekas and Tsitsiklis demonstrated the existence of a stationary optimal policy, the uniqueness 
of the fixed point of T, and that U for arbitrary E R^^ under the following assumptions: 

• (AO) All C'{xi,a) are lower-semicontinuous and all Pij{a) are continuous functions of controls a. 

• (AI) There exists at least one proper policy (i.e., a policy, which reaches the target t with probability 
1 regardless of the initial state x E X). 

• (A2) Every improper policy tt will have cost J^{x, tt) — +oo for at least one node x E X. 

(AO) and the compactness of control sets Ai allow us to replace "inf" with "min" in formulas H]) and 
(AI) corresponds to a graph connectivity assumption in the deterministic case while (A2) is a stochastic 
analog of requiring all cycles to have positive cumulative penalty. (A2) also follows automatically if 

C_ = min C{x,a) > 0. 

xex\{t},aeA{x) 
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The convergence of value iteration provides a way for computing U, but generally that convergence 
does not occur after any finite number of iterations (for a simple example, see Figure [T]). Some error 
bounds are available, but typically in an implicit form only [?1 Vol. I, Section 7.2]. A recent work by 
Bonet [7] provides a polynomial upper bound on the number of value iterations required to achieve a 
prescribed accuracy for the case when the ratio (|| oo / CZ) is a priori known to be polynomially bounded. 



P12 = 1/2 




P21 = 1/2 

Figure 1 : A simple example with only one available control at every node; the transition probabilities are indicated above 
and the control cost is C > 0. By the symmetry it is clear that Ui = U2 = u and u = C + ^{u + 0); thus, u = 2C. At 
the same time, the generic value iteration described by formula l[3j will not converge after any finite number of steps unless 
W° = U. 

2.1 Label-setting methods: the deterministic case. Fast methods for deterministic discrete 
control problems (e.g., searching for a shortest path in a graph or a network) can be found in all standard 
references (e.g., [T], [3]) and we provide a brief overview just for the sake of completeness. The dynamic 
programming equations are much simpler in this case: U{t) = and 

U, = min {C„- + UA , for ^x, G X\{t} (4) 

where U{xi) ~ Ui is the min-cost-to-cxit starting from Xi, N{xi) is the set of nodes to which Xi is 
connected, and Cij — C{xi,Xj) is the cost of traversing the corresponding link. In the absence of 
negative cost cycles and if every Xi is connected by some path to t, the value function is finite and 
well-defined. Value iteration ([3]) will converge to U after at most M iterations resulting in a 0{M^) 
computational cost. 

Label-setting methods provide a better alternative if a suitable lower bound on the transition costs is 
available. These methods reorder the iterations over the nodes to guarantee that each Ui is recomputed 
at most K, times, where the constant upper bound on outdegrees k, is assumed to be much smaller than 
the total number of nodes M . For example, Dijkstra's classical algorithm [12] is a label-setting method 
for the deterministic case provided all Cij > 0. The idea is based on the causality of the system Q: 

Ui may depend on Uj only if Ui > Uj. (5) 

Such an ordering is not known in advance and has to be obtained at run-time. The method subdivides 
X into two classes: "permanently labeled nodes" P and "tentatively labeled nodes" L and the values for 
Xi^s in L are successively re-evaluated using only the adjacent values already in P: 

U{xi) := min {dj +Uj} , for Xi G i, (6) 

XjeN{Xi) 

where Nixi) = N{xi) n P. The algorithm is initialized by placing all nodes into L and setting U{t) = Q 
and U{xi) — +00 for i = 1, . . . , M. At each stage the algorithm chooses the smallest of tentative labels 
U{x), "accepts" x (i.e., moves x from L to P), and re-evaluates Ui for each Xi G L such that x e N{xi). 
Since x is the only new element in N{xi), that re-evaluation can be more efficiently performed by setting 

Uix,) := mm{Uix,),{Cix„x) + U{x))}. (7) 

The algorithm terminates once the list L is empty, at which point the vector U G R^^ satisfies the system 
of equations ([4]). The necessity to sort all (finite) temporary labels dictates the use of heap-sort data 
structures [T], usually resulting in the overall computational complexity of 0(M log M). 
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In addition, if all Cy > A > 0, then Dial's label-setting method is also applicable llj. The idea is to 
avoid sorting temporarily labeled nodes and instead place them into "buckets" of width A based on their 
tentative values. If U{x) is the "smallest" of tentative labels and U{x) is currently in the same bucket, 
then even after x is permanently labeled, it cannot affect U{x) since 

U{x) < U{x) + A< U{x) + C{x, x). 

Thus, a typical stage of Dial's method consists of "accepting" (or declaring labels to be permanent) for 
everything in the current bucket, recomputing all nodes in L adjacent to those newly labeled permanent, 
switching them to other buckets if warranted by the new tentative labels, and then moving on to the 
next bucket. Since inserting to and deleting from a bucket can be performed in 0(1) time, the overall 
computational complexity of Dial's method becomes 0(M). In addition, while Dijkstra's approach is 
inherently sequential, Dial's method is naturally parallelizable. Many other enhancements of the above 
label-setting methods are available in the literature (e.g., see [3] and references therein). Most of those 
enhancements can be also used with the label-setting of SSP - provided the basic versions of the above 
algorithms are applicable. 

2.2 Label-setting methods: the general SSP. Given a stationary policy /i for a general SSP, 
we can construct its directed dependency graph using the nodes X\{i\ and connecting Xi to Xj if 
Pij{li{xi)) > 0. Assuming (AO), (Al) and (A2), it is easy to show that the value iteration for this 
problem converges after at most M iterations provided there exists an optimal stationary policy fj,* such 
that Gp. is acyclic. (See 4, Vol. II, Section 2.2.1]). We will refer to such SSPs as causal. 

Remark 2.1 This condition seems to forbid any self-transitions (e.g., pu^fJ.* (xi)) > for^Xi ^ t), but 
an SSP with self -transitions can be converted into an SSP without them by setting 

r<t \ C{x„a) Jo ifi^j, , i = l,---,M; 

Remark 2.2 One obvious set of causal SSPs consists of all problems where the dependency graph is 
acyclic for every stationary policy. The SSP belongs to this class if and only if for Va;^, Xj G X\{t}, 

3/ii s.t. there is a path from Xi to Xj in G^^ =4> ^fi2 s.t. there is a path from Xj to Xi in G^^. 

We will refer to such problems as explicitly causal (see examples and 1 3. 51 in the later sections). Such 
explicit causality is independent of the cost function and can be determined based on the available controls 
and the transition probabilities alone. The above property imposes a partial ordering on X ; using that 
partial ordering to go through the nodes, we will clearly have the value function computed correctly in 
a single sweep, yielding the computational complexity of 0{M). Thus, the applicability of label- setting 
methods described below is only important for SSPs which are causal, but not explicitly causal. This is 
similar to the fact that the original Dijkstra's method is not needed to solve the deterministic SP problem 
on any acyclic digraph. 

According to the definition introduced by Bertsekas in [?] , an optimal stationary policy /i* is consistently 
improving if 

p^J{^l*ix,))>o =^ u^>Uj. (8) 

This is a stochastic equivalent of the causality condition ([5]). Thus, the existence of such fi* not only 
guarantees that G^- is acyclic, but also allows us to avoid the value iteration process altogether since 
UiS can be computed by a non-iterative Dijkstra-like method instead. 

If a consistently improving optimal policy is known to exist, the new equivalent of the causal update 
equation ^ for each Xi E L is now 

( M+l 1 

U{xi) niin <C{xi,a)+ ^py(a)[/, >, (9) 



aeA{Xi) 



i=i 



where A{xi) is the set of controls, which make transition possible to permanently labeled nodes only, i.e., 
A{xi) — {a G A{xi) I Pij{a) — for all Xj ^ P}. Once x is moved from L to P, each Xi e L needs to be 
updated only if the set 

A{x^,x) = {aeA{xi) \ p{xi,x,a) > Q} 
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is not empty. Finally, the new equivalent of the efficient update formula ([7]) is now 

r r M+i \ ^ 

U{x.j) := min< U{xi), min _ <C{xi,a)+ pij{a)Uj } > . (10) 
y aeA{x,,x) y ^ j J 

If a constant k << M is an upper bound on stochastic outdegrees (i.e., if for i — 1, ... ,M we have 
K > the total number of nodes Xj for which 3a S A{xi) such that Pij{a) > 0), then a Dijkstra-hke 
algorithm described above will have a computational cost of 0(AflogA/). Upon its termination, the 
resulting U £ i?^^ will satisfy the system of equations ^ . The proof of this fact is straight-forward and 
is listed as one of the exercises in |4] . 

Here we introduce a similar definition: 

Given J > 0, an optimal stationary policy fi* is consistently d-improving if 

(/i*(a;,)) > =^ U^>U,+6. (11) 

When (5 > 0, it is similarly easy to show that the existence of a consistently (5-improving optimal policy 
guarantees the convergence of a Dial-like method with buckets of width 6 to the value function of the 
SSP. As in the deterministic case, for k << M, the resulting cost will be 0(M). Every consistently 
(5-improving policy is obviously also consistently improving; when 5 — 0, this reduces to the previous 
definition. 

Unfortunately, conditions ([8]) and pT|) are implicit since no optimal policy is a priori known. Thus, 
for a general SSP applicability of label-setting methods is hard to check in advance. It is preferable 
(and more practical) to develop conditions based on functions C{xi,a) and p{xi,Xj,a), which would 
guarantee that every optimal policy is consistently improving (or ^- improving). We will refer to such 
SSPs as absolutely 6-causal (or simply absolutely causal when S = 0). Before developing such explicit 
conditions for a particular class of Multimode-SSPs in section [31 we make several remarks about the 
general case. 

Remark 2.3 (Consistently almost improving policies) 

Comparing the deterministic causality condition (0) with the condition (0), it might seem that a Dijkstra- 
like method should work whenever there exists a "consistently almost improving optimal policy" , i.e., an 
optimal jjL* such that 

p,j{pi*{xi))>{) =^ U,>U,. 
A simple example in Figure Q] demonstrates that this is false. Indeed, for this example a Dijkstra-like 
method would terminate with Ui — U2 — +00 even though the optimal policy is consistently almost 
improving and the correct value function is Ui = U2 = 2C. 

Remark 2.4 (Lower bounds on control cost) 
If fJ.* is an optimal policy and a* = fi*(xi), then 

M+l M+1 

C(xi,a*) ^ Ui - ^ pij(a*)Uj = ^ Pij(a*)(Ui - Uj). 
i=i j=i 

This means that C(xi,a*) > S >0 when fi* is d-improving. Thus, when building label-setting methods, 
the natural class of problems to focus on is an SSP with 

(A2') C(xi,a) > for all Xi e X\{t} andMa £ A(xi). 

We note that (A2') and the compactness of all Ai 's imply (A2). 

Remark 2.5 (Label-setting on a reachable subgraph) 

Consider a reachable set Xc consisting of all nodes x £ X such that there exists a policy vr leading from 
X to t with probability 1. Assumption (Al) states that X = Xc. If this is not the case, but the condition 
(A2') holds, then U(xi) = +00 for all Xi ^ Xc. If a stationary policy fi* is optimal, then (A2') implies 
Pij(fj,* (xi)) = whenever Xi G Xc and Xj ^ Xc. If pL* also satisfies (0) on Xc, then a Dijkstra-like method 
is still applicable. Upon its termination, the value function will be computed correctly on Xc and we will 
have U{x) = -\-oo for all x ^ Xc. (This is analogous to using the original Dijkstra's method on a digraph 
that does not contain directed paths to t from every x £ X .) Of course, an efficient implementation will 
terminate the method as soon as all nodes remaining in L have a label of +00 . 
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Remark 2.6 [Label- setting for SSP: prior work.) 

It is natural to look for classes of SSPs, for which either (0) or is automatically satisfied by every 
optimal policy. One simple example is the deterministic case: if for every Xi G X\{t\ and^a G A{xi) 
there exists Xj G X such that Pij{a) = 1, then every optimal policy is consistently improving due to 
(A2'). Tsitsiklis was the first to prove causality of two truly stochastic SSPs 132, 33], which he used to 
develop Dijkstra-like and Dial-like methods for two special discretizations of the Eikonal PDE on a uniform 
Cartesian grid. For Eikonal PDEs discretized on arbitrary acute meshes, the equivalent of property (0) 
for all optimal controls was proven in f28i, Appendix]. Another implementation of a Dial-like method for 
the Eikonal PDE was introduced in U5f . For the optimal control of hybrid systems, a similar property 
was used to build Dijkstra-like methods in J^I and J29\j . It is interesting to note that of all these papers 
only Tsitsiklis ' work mentions the SSP interpretation of the discretizations, but even in \33}/ the proof of 
causality is very problem- specific and relies on the properties of the PDE and on a particular choice of the 
computational stencil. In section^ we use MSSPs to provide convergence criteria for Dijkstra's method 
in the above cases as well as the bucket-width for Dial's method whenever it applies. 

Remark 2.7 (Label- correcting methods for SSP.) 

Whenever the value iteration converges after finitely many steps, label-correcting methods become an- 
other viable alternative. Their implementation for the deterministic case can be found in standard refer- 
ences (e.g., fJ]/, JSjl). Two such methods were introduced in J21f for the SSP considered in [5^1. In a more 
recent work a similar method was applied to a finite element discretization of the Hamilton- J acobi- 
Bellman PDE. In the latter case, the label-setting is used to obtain convergence-up-to-specified-tolerance 
even though the equivalent of condition ^ is not satisfied. Label-setting methods have an optimal worst- 
case computational cost; however, in practice label- correcting methods can outperform them on many 
problems. The exact conditions under which this happens are still a matter of debate even in determinis- 
tic problems. While clearly interesting, the comparison of their performance on SSPs is outside the scope 
of the current paper. 

3. Multimode Stochastic Shortest Path Problems. We will use S„ to denote the set of possible 
barycentric coordinates in R"", i.e., 



We will further define /(^) = {i \ > 0} and use {ei,...,e„} to denote the standard canoni- 
cal basis in ii". Finally, we will use fi+ g to denote the non-negative orthant in iJ", i.e., i2+ g = 
{(xi,--- ,x„) I \fxj > 0}. 

We will assume the following 

(i) For every node Xi G X\{t} we are given a list of "modes" Mi = A4{xi) = {mi, • • • , nir-}, where 
each mode m G A4i is a non-empty subset of X\{xi} and = r{xi) = > 1- 

(ii) The nodes within each mode are ordered; i.e., m = {z"\ • • • , where 2;™ 7^ 2;™ if j 7^ k. 

(iii) All controls have a special structure a = (rn,^ G S|„,|) and there exists an available control 
(m,^) G A{xi) for all m G Aii and all ^ G S|m|. 

(iv) The corresponding transition probability is 



(v) The transition costs are defined for each mode separately, i.e., C {xi, (m, £,)) — C"'^{xi, ^). 

(vi) For \/xi G X\{t} and Vm G Aii the function C™(a;i,^) is a positive continuous function of 

(vii) There exists a constant upper bound k on stochastic outdegrees; 



= (6, • • • I 6 + • • • + = 1 and V^, > 0} . 



p{xi,x, (to,C)) 



( 



if a; = z™ for some j £ {I, - ■ ■ , |m|}; 
0, otherwise. 



I.e., 




For these MSSPs it is natural to interpret the decision made at each stage as a deterministic choice of 
a mode m plus the choice of a desirable probability distribution for the transition to one of the successor 
nodes in m. We note that the above framework is sufficiently flexible: each node can have its own number 
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of modes, each mode can have its own number of successor nodes, and different modes can have overlaps 
(e.g., can be the same as z^^). The fully deterministic case is conveniently included when |m| = 1 
for each mode m. 

The above assumptions imply (AO) and (A2'); hence, the value iteration converges at least on the 
reachable subgraph Xc (see remarks 12.41 and [ 



The dynamic programming equations ([T]) can be now rewritten as 



U{x) = mhi {V^{x)}, 

meM{X) 



\m\ 



(12) 

(13) 



Before developing criteria for solvability of the above equations by label-setting methods fsubsections l3.2l 
and l3.3p we provide a number of representative examples to illustrate the MSSP framework. 



3.1 MSSPs and Modeling. In this subsection we list several examples of discrete stochastic control 
problems, which are naturally modeled in the MSSP framework. Our goal is twofold: to explore the type 
of stochasticity present in MSSPs and to understand which types of MSSPs make the development of 
label-setting methods worthwhile. 

We begin by considering two very simple MSSPs, which illustrate the difference and relationship 
between explicit and absolute causalities. 



Example 3.1 For M = 3, suppose that each node has only one mode, and nodes Xi^x^ have only one 
node t in their modes. I.e., the transition to t is deterministic and costs Cu > for i = 1,3. The X2 's 
only mode is m = {a;i,cc3}. (See Figure\^.) Since the problem is so simple, it is clear that 



Ui — Cit] Us — Cst', 



U2 = min {C"\X2,0 + (aC/i +6C^3)} ■ 



This SSP is obviously explicitly causal: U2 will be computed correctly, provided it is computed after Ui 
and C/3 (see Remark [221). 

However, whether or not this SSP is absolutely causal depends on the cost function: 
Suppose f * is the unique minimizer of the above and C" is such that 



C/i < C/2 < C"' X2, 



Ui < C/3. 



If Q > 0, it is clear that the Dijkstra-like method of section [^T^ would compute U2 incorrectly (since X2 
would be moved from L to P before X3). If label-setting methods were to be used here, we would need 
to find conditions on C"^{x2, £,), which make the above scenario impossible for any choice of positive Cu 
and Csf 





Figure 2: Two simple examples of MSSP. In both cases, starting from Xi, one needs to select an optimal probability 
distribution over two successor nodes (dashed & dotted lines) or to opt for the deterministic transition to t (priced at 
Cit > and shown by solid lines wherever available). 
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Example 3.2 For a somewhat more interesting example, consider a circular doubly linked list of M 
nodes. (See Figure\B/3 for the case M = 6.) Each Xi has two modes: m = {xp^ev, Xnext) and m' = {t}. 

The applicability of label-setting methods seems harder to judge in this case, but it is clear that we 
don't want Xi moved from L to P before its neighbors if the optimal choice at Xi involves a possible 
(non-deterministic) transition to one of them. For instance, 

U2 = min I C2t, mm {C™(a;2,0 + iCiUi + ^(/s)} | , 

and we note that this equation is provably absolutely causal if in Example 13.11 a Dijkstra-like method 
produces correct U2 for all allowable Cu and C^t- In fact, if the same C'^{xi,£^) is used for each Xi, the 
above condition is sufficient to show the absolute causality of the full problem. This idea is generalized 
in subsection 13.21 

In practical terms, whether or not the MSSP in Example 13.11 is absolutely causal is irrelevant since 
the value function can be easily computed directly (see Remark 12. 2p . On the other hand. Example 13.21 
can be viewed as a variant of an optimal stopping problem, whose absolute causality would yield a more 
efficient alternative to the basic value iteration when M is large. We continue by considering a number 
of interesting single-mode-for-each-node examples. 

In the opening act of Tom Stoppard's famous play [30] . the title characters engage in statistical 
experimentation with supposedly fair coins. The fairness of their coins is highly suspect since they are 
observing a very long and uninterrupted run of "heads" . Rosencrantz (Ros) is bored by the game and 
would be glad to stop playing, but Guildenstern (Guil) insists on continuing. The following two examples 
are inspired by the above. 

Example 3.3 Suppose Guil will agree to stop only after observing K "heads" in a row. Ros has to 
pay some fee for every toss of a coin and is interested in minimizing his expected total cost up to the 
termination. Moreover, suppose that for each toss Ros can request a coin with any probability distribution 
ij), {l~p)) on possible outcomes ("heads" vs. "tails"), but Guil intends to charge him a different fee C{p) 
based on his request. The problem is to find an optimal p* G [0, 1] that Ros should request after observing 
i "heads" in a row (i.e., in the state Xi). 

Figure [3] (Left) shows the graph representation of the game for K = 3. Denoting xk = t, we set Uk ~ 0. 
Since there is exactly one mode per node, and two successor-nodes only € S2; Ci = Pi ^2 — — p)), 
the Dynamic Programming equations of this game can be re-written as 

U, = mm{C{x„^) + ^iU^+i+^2Uo} = min {C (p) + pU^+i + {1 - p)Uo} . for ^ = 0, . . . , X- 1. 
«eH2 pe[o,i] 




Figure 3: The first Guildenstern and Rosencrantz game for K = 3 (Left). After i "heads" in a row the game-state is Xi. 
Transitions corresponding to "heads" and "tails" are shown by dashed and dotted lines respectively. The self-transition in 
XQ can be removed and replaced by a deterministic transition (solid line) with the optimal cost C'oi (Right). 



We note that the self-transition in the node Xq can be dealt with in the spirit of Remark 12. 1[ see 
Figure [3] (Right). This results in a deterministic transition to Xi: 

Uq = Coi + Ui, where Co 1 = mm = — — . 

pe(o,i] p pI 

After this simplification, the example satisfies all the assumptions listed for MSSPs; therefore, the ap- 
plicability of label-setting methods can be determined by checking if C(^) satisfies any of the criteria 
developed in section [ 
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Remark 3.1 Even though this MSSP is not explicitly causal, a simple structure of the graph makes it 
almost trivial for our purposes: 

1. Every (eventually terminating) path from Xi leads through Xi^i, which implies Ui > J7i+i. Thus, 
label-setting methods are only applicable if p* — 1 for all i. 

2. On the other hand, recursive relations similar to the one used above can be repeatedly employed to 
make this into a deterministic problem. For example, 

Ui = min {C{p) + pU2 + (1 - p)Uo} = min {C{p) + pU2 + (1 - p)(Coi + Ui)} 
pG[o,i] pe[04] 

= min {[Cip) + (1 - p)Coi] + pU2 + (1 - p)Ui} = C12 + U2, 
pe[o,i] 

where 

. C(p) + (l-p)Coi Cipl) + {l-pl)Co, 
O12 = mm — . 

pe(o,i] P p*i 

Repeating this procedure we can compute the value function in 0{K) steps (counting the above minimiza- 
tion as a single operation) even if some of the p* 's are less than one (in which case the value iteration 
would not converge in a finite number of steps). 

Example 3.4 Now suppose that Guil will agree to stop only after observing an uninterrupted run of 
"heads" or Kt "tails"; see Figure^ 




Figme 4: The second Guildenstern and Rosencrantz game for K/^ = 4 and Kt = 3. After i "heads" or "tails" in a row 
the game-state is x'f or £c* respectively. 



Identifying t = x'^^^ ~ x\^^ and Xq ~ x^ ~ x''^^ we can re- write the dynamic programming equations as 
U{t) - 0; 

U{x^) = min {C{p) + pC/«i) + (1 - p)U{x\)] ; for i = 0, . . . , if^ - 1; 

pe[oa] 

U{x\) = uim{C[p)+pU[x1) + {l-p)U{x\^^)]- fori-0,...,ift-l. 

Since the Remark 13.11 does not apply, in this case it is possible to have a non-trivial optimal strategy 
(i.e., p* G (0, 1)), which might be computable by the label-setting methods. Their applicability can be 
guaranteed by certain properties of the cost function as will be shown by theorems of section 13.31 For 
example, this MSSP is absolutely causal (and thus efficiently computable using a Dijkstra-like method 
regardless of specific values of and Kt) for 

Ci(p)=3 + 2p-/-(l-p)2; or C2{p) = Vp' + (1-p)2; or C^ip) = 4 + {p - O.bf . 

(Recalling that = p and ^2 = i^—p), it will be easy to check that Theorems 13. 1113.21 and 13.31 applv to 
Ci, C2, and C3 respectively.) 

A similar analysis works when Guil is allowed to use different prices depending on the current state of 
the game (i.e., with C{xi,p) instead of C{p)) or when the number of possible outcomes is higher (e.g., 
dice instead of coins, Eq instead of S2.) 
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Example 3.5 Suppose a person is engaged in multi-tasking, dividing her attention between activities A 
and B. This allocation of resources is described by ^ — {S,a,S.b) G ^2- We assume that 

• per every time-unit she reaches a new milestone in exactly one of these activities; 

• the probability of a milestone reached in A or B is proportional to the fraction of her attention invested 
in that activity (S^a or ) during that time-unit; 

• the current state of the process Xij reflects the number of milestones reached in both activities; 

• the cost (per time-unit) of all possible resource allocations is specified by C{xij,^); 

• the process terminates after at least Ka milestones are reached in A or at least Kb milestones reached 
in B; 

• the goal is to minimize the total expected cost up to a termination. 
A particular instance of this problem is illustrated in Figure\^ 




Figure 5: The multitasking problem for Ka = 3 and Kb = 2. Each of the nodes Xi^Ks ^Ka,3 ^ deterministic 
transition to t only. All other nodes ccij have a single mode (cci j-i^i , aj^+i j ). The node xka,Kb ^1°^ needed since the 
process always terminates before reaching it. 

The above MSSP is obviously explicitly causal since the number of milestones achieved in each activity 
can only increase as time goes on regardless of the chosen policy. As usual with explicitly causal SSPs, 
the causal ordering of the nodes is a priori known regardless of the cost functions and the label-setting 
methods are really not needed. However, a slight variation of the above is already computationally 
challenging: 

Example 3.6 Suppose the same person also dedicates a part of her attention to some distraction D and 
her resource allocation is now ^ = (Ca,C_b,?Z3) S S3, where is the probability of getting completely 
distracted and inadvertently "resetting" the process (i.e., transition into Xq^q). 

If the diversion is appeahng (i.e., if C{xij,^) is a decreasing function of^u), this problem is not expHcitly 
causal and the applicability of label-setting methods becomes relevant. The possibility of self-transition 
in Xq^q is again dealt with in the spirit of Remark 12.11 and theorems from section 13.31 can be then used 
to test for the absolute causality. Generalizations of this example (to an arbitrary number of activities 
and/or partial resets due to a diversion) can be handled similarly. 

We note that the MSSPs occupy a niche in between purely deterministic and generally stochastic 
shortest path problems. It is easy to see that in all of the above examples the stochastic aspect of 
the model is not due to some uncontrollable event (after all, the deterministic/pure controls are always 
available in MSSPs), but rather due to our belief that a randomized/mixed control might carry a lower 
cost. 

Remark 3.2 (Randomized/mixed controls & deterministic SP) 

In most deterministic discrete control problems mixed policies are considered unnecessary. But this is 
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mainly due to the fact that the cost of implementing such mixed/randomized controls is usually modeled 

\m\ 

by a linear function, i.e., C""(a;,^) — ^ S,jCj. (More generally, Theorem \3.1\ will show that an optimal 
control can be found among the pure controls {e^} for any concave cost function.) However, if the cost 

\ni\ 

is non-concave, i.e., if C^^{x,S^) < ^ £^jC"^{x,ej) for at least some ^ G then a mixed strategy is 

available "at a discount" and might be advantageous. 

The methods developed in this paper are therefore most useful for MSSPs that 

• are not explicitly causal (otherwise direct methods are more efficient); 

• but are absolutely causal due to (possibly non-concave) costs satisfying criteria in Section 13.31 

Additional examples (stemming from discretizations of continuous optimal control problems) are discussed 
in section m 

3.2 Causality of MSSP and single-mode auxiliary problems. Checking whether a given MSSP 
is absolutely causal can be hard, although sufficient conditions can be developed hierarchically. This 
approach was already used in subsection 13.11 to show the relationship between examples 13.11 and 13.21 

In the general case, if /i* is an optimal pohcy and fi* [x) = (m* , ^* ) , then the formulas ([T^ - [T31) imply 

|m* 

u{x)^v"^'{x) = c"^'{x,e) + EOf^(^f )• 

If ^* is consistently (5-improving, we have (i^* > 0) =^ F™* (x) > U{zJ^') + S. 
Observation 3.1 For each mode m let S* C be a set of all minimizers in formula m3\) . If 

> 0) =^ V"\x) > C/(^™) + S, for ve G 'Jr.! >lr^' ^^^^ 

then every optimal policy is consistently S-improving (and this MSSP is absolutely S-causal). 

As a result, we can develop label-setting applicability conditions on a mode-per-mode basis. In the 
following we will focus on one x E X\{t} and one mode m E A4{x) to develop conditions on C"^{x, •) 
that guarantee causality for all possible values of U{z™)'s. Since x and m are fixed, we will simplify the 
notation by using 

V^V"^ix); C{-)^a^{x,-)- W,=Uizf); n ^ \m\. 
Furthermore, interpreting ^ and W as column vectors in R" , we define ; S„ x i?" i— )■ i? as follows: 

F{^,W) = C{x,0 + i^W. 
The dynamic programming equation p3p can be now rewritten as 

V = mm |c(C) + E^^-^^ l = Itii^^^'^^ ^^^^ 
Once the vector W is specified, this also determines the set of minimizers E.*(W) — argminF(^, W). 

Definition 3.1 The mode m is absolutely (5-causal if 

(e; > 0) =^ V>Wj+6, for "iW e RX^; VC G ^*{W); j = 1, . . . , n. 
We will also refer to a mode as absolutely causal if the above holds at least with S ~ 0. 

A simple way to interpret this definition is by considering an auxiliary single-mode MSSP on the nodes 
{x, 2™, . . . , 2;™, t} with a single mode for each node (see Figure [SI). Let the transition from each to 
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Figure 6: An auxiliary single-mode problem for m € M{x). Deterministic transition arc shown by solid lines; n = \m\ = 4. 



t be deterministic with cost Cjt — Wj > 0, and for x let the mode be m = (2;™, . . . ,2™) using the 
transition cost C""(a;, •) from the original problem. 

The mode m is absolutely causal if a Dijkstra-like method solves the auxiliary single-mode problem 
correctly for every vector W E i?" q. The mode is absolutely J-causal if the same is true for a Dial-like 
method with buckets of width 5. In fact, Example 13.11 can be viewed as such auxiliary problem for the 
mode m S A4{x2) of Example 13.21 We emphasize that the absolute causality of auxiliary problems is 
desirable not because we intend to use label-setting on any of them (after all, each auxiliary problem is 
explicitly causal, and a direct computation is efficient; see Remark 12.21) . but because the label-setting 
methods might be advantageous on the original MSSP. 



The conditions on the mode m in the definition 13.11 are more restrictive then those in (jl4[) since in 
the latter case the J-causality is needed for only one (albeit unknown) vector W. Thus, Observation [5Tl] 
yields the following sufficient condition for applicability of label-setting methods to MSSPs: 

Corollary 3.1 For a general MSSP, if every mode of every node is absolutely causal, then the MSSP 
is also absolutely causal and a Dijkstra-like method is applicable. If each mode m is absolutely d^-causal, 
then the MSSP is absolutely A-causal with 



A = min {5 m} 

\xex\{t},meM{x) 

and a Dial-like method is applicable if A> 0. 

We note that it is possible to have an absolutely causal MSSP some of whose modes are not absolutely 
causal. This is reminiscent of the fact that the original Dijkstra's method might be converging correctly 
even for some deterministic problems with negative transition penalties. 

3.3 Criteria on cost and absolute causality. Consider a mode m £ A4{x) such that n = \m\ > 1. 
In view of Corollary 13.11 it is important to find additional conditions on the transition cost function 
C(-) = C"^{x, •) that guarantee m's absolute (5-causality. 

n 

One obvious example is C(^) — ^ ^.jCj, where all Cj's are positive constants. In that case F is linear 
and equation (fTS]) reduces to 

V = mm F{^,W) = mm |f^e,(Q + = mm{{C,+W,)}, 

which is not different from the deterministic shortest path equation ([?]). The same principle works for 
arbitrary concave costs. 

Theorem 3.1 Suppose 

(A3) C : ii" i-^ is concave. 
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Then the mode m is absolutely S-causal with S = min C(ej). Moreover, V can be more efficiently evaluated 

j 

asV ^ inm{{C{ej) + Wj)} . 
j 

Proof. Since W) = C(^) + £,'^W we know that the function W) is concave on E:„. Thus, 
if r e E*(W) = argminF(C, W) then 

(e;>0) =^ F(e„Ty) =^ U-Wj=Cie,)>d>0, 

hence the mode m is absolutely (5-causal. □ 

Homogeneous cost functions naturally arise in many SSPs. We recall that a function L{y) is absolutely 
homogeneous of degree d if L{ay) = \a\'^L{y) for all y e ii", a G i?. If L is also smooth, by Euler's 
Homogeneous Function Theorem, it satisfies the equation y^WL{y) = dL{y). 

Lemma 3T Suppose the cost 

(A4) C : i?" I— > i?+ is continuously differentiable and absolutely homogeneous of degree d. 
Then for every W S il" q, £,* G E* {W) we have 

8C 

(^;>o) =^ v-w,^—{C)-{d-i)c{C)- 

Proof. For all j G !{£,*) the Kuhn- Tucker optimality conditions state that 

A = W^, + ^(r), (16) 
where A is a Lagrange multiplier. We recall that ^ ^* = 1. Multiplying ((T6)) by and summing 
over all j G we obtain 

^= E E e;(w^. + l?(n)-Ecw^. + fEc|^(n). (17) 

Thus, by Euler's Homogeneous Function Theorem, 

11 

A = E^«*^^ + ^^(^*) = ^(r, w/) + (d - i)c(r). (18) 

4 = 1 

Since is a minimizer, 1^ = F{£,*,W) = A - (d - l)C(r) and it follows from ^ that 

^-^3 = J^in - (rf- l)C(r) for aU W G i?';,o, T e S*(Ty),j G /(D- 

□ 

Theorem 3.2 If C satisfies (A4) and 

dC 

(A5) ><5>0 /orVeGS„,Vj-G/(0. 

i/ien i/ie mode is absolutely 5-causal. 

Proof. If £,* G E*{W) and ^* > 0, the condition (A5) and Lemma O imply that V -Wj > 5>Q. 

□ 

Remark 3.3 The case most frequently encountered in applications of SSPs is the homogeneity of degree 
one. When d = 1, equation hl8\) states that \ — V and the condition (A5) becomes even simpler 

dC 

(A5') _(^)>5>0 /orVeGS„,Vj G/(C). 



Lemma \3.1\ and Theorem \3.2\ can be viewed as generalizations of the key idea in proofs of causality in \33f 
and 1281 Appendix]. 
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Remark 3.4 //(A4) holds and C is strictly convex, then (A5) is a necessary condition for the absolute 
S-causality of the mode. Indeed, suppose (A5) is violated for some ^ S S„, j € /(^) and let K — 
1 + max If for each i ^ I, . . . ,n we choose Wi — K ~ ^^'s ensures that W G i?" , K — X, 

and = {^}, which implies V < Wj + 6 even though > 0. 

Lemma 3.2 Suppose the cost 

(A6) C : i?" J2_|_ is twice continuously differentiable. 

Then for every W £ EJ^q,£,* G 'B*{W),j £ !{£.*) there exists a point ^ on the straight line segment 
[cj, ^*] such that 

V-W, = C{e,) - i(e,-rrff(0(e,-r), 
where H is the Hessian matrix ofC{£,). 

Proof. If ^* e E*{W) and j G then the Kuhn- Tucker optimality conditions yield two different 

formulas (|16p and (fT7)l for the Lagrange coefficient A. Combining these we see that 

By Taylor's theorem there exists a point ^ G [cj^C*] C S„ such that 

C{e,) = C(r) + (e, - r)^VC(r) + ^(e, - CrH{i){e, - f); 
thus, V-W,^ C(e,) - i(e, - H[i){ej - C)- □ 



Theorem 3.3 Consider an n by (n — 1) matrix B , whose columns form an orthonormal basis for the 
subspace orthogonal to [1, . . . , 1]"^ G Suppose the cost C satisfies (A6) and H{£,) = H{£)B is its 
projected Hessian. If h{H(S^)) is the maximum eigenvalue of H{^) and 

(A7) min C(e,) > S + maxjo, max A (ff (o) | 

then the mode is absolutely 5-causal. 

Proof. First, we assume that max A > (the other case is already covered by Theorem 

IXT|) . If C G S*(H/) and j G I {£,*), then there exists /3 G R"^^^ such that (e^ - T) Bp. We note that 
WPW ~ \Wj ^ — v^. Since the Lemma [3T2] applies. 



V^W,^ C{e,) - \l3'^m)P > C{e,) - \\\pfK > min C(eO - m|x A (h{C}) 



> S. 

□ 



Remark 3.5 Since the cost function is always evaluated on the condition (A4) is somewhat awkward: 
the cost can always be considered absolute homogeneous of degree one since C(^) can be replaced by 

= ll^lli^ (iMt)' ^'^•^ same values as C on S„. A more meaningful question is: supposing 

that C is smooth and homogeneous of degree one, what additional conditions on C and its directional 
derivatives inside S„ are sufficient to guarantee (A5') ? It is easy to see that (A7) is an answer to that 

question since max A (^H{^)j is the upper bound on the second derivative of C restricted to any straight 

line in S„. 
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4. MSSPs approximating continuous deterministic problems. As already mentioned, MSSPs 
naturally arise in approximations of deterministic continuous optimal control problems. To illustrate this, 
we consider a class of time-optimal trajectory problems. Many variants of these problems are studied in 
robotic navigation, optimal control, and front propagation literature; a detailed discussion of the version 
presented here can be found in [28_ . 

Suppose y{t) e is the vehicle's position at the time t and the vehicle starts at y{0) — x inside the 
domain Q. We are free to choose any direction of motion (any vector in 5*1 = {a e \ ||a|| = 1 }), but 
the speed will dependent on the chosen direction and on the current position of the vehicle. The vehicle's 
dynamics is governed by y'{t) = f{y{t), a{t))a{t), where / : R'^ x 5*1 i? is a Lipschitz-continuous speed 
function satisfying < Fi < f{x,a) < F2 for all x and a. Additional exit-time-penalty q is incurred at 
the boundary; we will assume that q : dfl 1-^ ii is non-negative and Lipschitz-continuous. 
The goal is to cross the boundary dft as quickly as possible. 

The value function of this problem is u{x) (the minimal-time-to-exit after starting from a;). It is well- 
known that u{x) is the unique viscosity solution [10 of the following static Hamilton- Jacobi-Bellman 
PDE 

1, X eflc R^ 
X e dn. 



max{(VM(a;) • {—a))f{x,a)} 



u{x) = q{x), 



(19) 



The optimal trajectories coincide with the characteristic curves of this PDE. If the problem is isotropic 
(i.e., if f{x,a) = f{x)), the above PDE is equivalent to the usual Eikonal equation || Vu(a;)||/(a;) = 1 
and the optimal trajectories coincide with the gradient lines of u(x). 

For simplicity we will first assume that the domain Q is rectangular and that X is a uniform Cartesian 
grid on fl. Concentrating on one particular gridpoint x G X C] fl, we will number all of its neighbors as 
in Figure [71 Suppose that the optimal initial direction of motion a lies in the first quadrant and assume 
that the corresponding optimal trajectory remains a straight line until intersecting the segment Xix^, at 
some point x (see Figure [TK) . Then it follows that 

Let X = ^iXi + ^2X3] a linear approximation yields 

\\x — x\\ 



u{x) 



fix, a) 



^lu{Xi) +S,2U{X3 



Of course, since x is not a priori known, we would have to minimize over all possible intersection points 
and all four quadrants. 




Xs Xq 

A B 

Figure 7: Two simple stencils using 4 nearest neighbors (A) or 8 nearest neighbors (B) on a uniform Cartesian grid. 



We will enumerate all quadrants as follows M{x) — {{xi^x^), (x^^x^), (0:5, 0:7), [x^ ,Xi),} . For any 
(2:™, 2;™) e Ai and any ^ e S2 we can similarly denote 



r(0 = Pe-a;|l; = [x^ - x)/T{i). 
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We can now state a semi-Lagrangian discretization of the PDE : 

U{x) = min min(-f^ + (aC/«)+6f/(2™))), for Va; G X n 17; (20) 

U{x) = q{x), for Vcc e X n 

This (fully deterministic) derivation is similar to the one used by Gonzales and Rofman in [14j . 

On the other hand, it is easy to see that this system of equations also describes the value function for 
an MSSP on XU {<}: 

• for the nodes x E X C\ dfl, there is a single (deterministic) transition to t with the cost q{x); 

• for the nodes x E X O il, the set of quadrants M{x) can be interpreted as a set of modes and 
C"^ix,0=TiO/f{^,a^)- 

This interpretation is in the spirit of Kushner's and Dupuis' approach of approximating continuous 
optimal control by controlled Markov processes [T8| . 



On a uniform Cartesian grid and the stencil of Figure [TK, we can express t(^) — ft-y^f+Cf' where 
h is the grid size. If the problem is isotropic, the cost function becomes C{x,^) ~ {h / f (x)) S^l + A 
similar construction in i?" leads to modes containing n neighbor-nodes each and the cost function 



C{x,0^ 



E^f- (21) 

1=1 



This function is homogeneous of degree one in terms of ^; moreover, ^^(a;,^) = /(a;) which is 
positive if and only if > 0. By the Theorem I3.2[ each mode is absolutely causal and a Dijkstra-like 
method can be used to solve the problem. This is in fact the first of two methods introduced by Tsitsiklis 
in [33]. Since this C is also convex in ^, the Remark l3 . 41 shows that the modes are not absolutely (5-causal 
for any (5 > 0; hence, the Dial's method is generally not applicable. 

Another obvious computational stencil in uses all 8 neighboring gridpoints as shown in Figure 
[7)3. Here the optimal trajectory is still assumed to remain a straight line until the intersection with a 
segment, but the list of segments is different: 

M{x) ^ {{xi,X2), {X3,X2), {X3,X4:), {X5,X4:), {X5,X6), {X7,X6), {x7,Xs), (a;i,a;8)}. 

The discretized equation (I^Dl) still holds, but the difference is that 



r{0 = - x\\ = II (Ci^r + 64") -x\\=hJl + C; 



If the problem is isotropic, the cost function becomes C{x, ^) — {h/ f{x))y/l + Theorem l3.3l is certainly 
applicable, but instead we re- write the above as a function homogeneous of degree one (see Remark l3.5|) : 
C{x,^) = (V/(a;))V(Ci +6)^ +^2- We now note that 

^ ' " fix) rio ' 96 ^ ' ^' " fix) r(o - 

Since t{£_) < we conclude that 

dC"^ h 

— — (a;,C)> ^^S>0, for Va; e 17,Vm e A^(a;),VC e S2, j = 1,2. (22) 

ot,j F2V2 

By the Theorem 13.21 each mode is absolutely (5-causal and a Dial-like method can be used with buckets 
of width S (corresponding to the second method introduced in [55]). 

More generally, suppose that X is a simplicial mesh on C i?" with the minimum edge-length of h. 
Let S{x) be the set of all simplexes in the mesh that use x as one of the vertices. Each such simplex 
s g S{x) corresponds to a single mode m £ M{x) consisting of all other vertices of s besides x. For any 
mode m = {z™, . . . , 2™) and any ^ G 5„ we can similarly define 

n 

^r^E^^^r; r"(0-||5f -a;||; ^ (x^ - x)/t"^{0- 

i=l 
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Since T™(e) = W E E e.6(^r - xYi^k -x),we see that 

y j=i k=i 



^(0 ^ = ' ^i^y - i-T - -f-T - \K - -II -osPu. 

where /3^j is the angle between a™ and (z™ — x). Suppose f3{x,m) is the maximum angle between a 
pair of vectors {z^ — x) and (2;™ — x) maximizing over all i,k E {1, . . . , n}. Furthermore, define 

B(x) — max B(x,m): B= max Bix). 

Since a™ lies in the cone defined by (2;™ — x), . . . , (2;™ — x), we know that < /3(cc, m) < /3(a;) < (3. 
Therefore, 

(/3<|) =^ ^(0 = ||^;"-a;||||a^"||cos/3e,, >/icos/3>0. 
The dynamic programming equations in this case become 

U{x) = min min J j"^^!, + ( V C.C^l^D ) 1 , for Va; S X n ^i; (23) 

C/(a;) = q{x), for Va; e X n 5r2. 

The cost function C™(a;, ^) = / f{x, a™) is homogeneous of degree one in ^. For the isotropic case, 

we see that 

dC^ _ 1 dT"\ ^^ ^ \\zf-x\\ cos I3^^j 

56 ^""'^^"/W ac/^^" fix) 
Thus, for the Eikonal PDE on any acute mesh (i.e., for (3 < ^), each mode of the discretization is 
absolutely causal by Theorem 13.21 and a Dijkstra-like method is applicable (this is a re-derivation of the 
result in [28l Appendix]). Moreover, if /3 < f then ^{x, ^) > -^^-^ = (5 > 0. This provides the optimal 
bucket-width S to use in a Dial-like method when solving the Eikonal PDE on any acute mesh. As far as 
we know, no general formula for S has been derived elsewhere up till now. 

We note that the last result is applicable even in a more general situation, when the computational 
stencil S{x) does not correspond to a set of non-overlapping simplexes present in the mesh. E.g., for 
the example in Figure 03, /? = 7r/4 and this yields the same 6 as in ([^ . That leads to an interesting 
dilemma: including more nearby nodes into a computational stencil usually decreases /3 and increases the 
bucket-width thus reducing the total number of "bucket-acceptance" steps until the termination of Dial's 
algorithm. On the other hand, a larger ^(a;) increases both the computational complexity of a single 
step (more tentative labels to update after each acceptance) and the discretization error (proportional 
to h in the above examples). Finding an optimal way for handling this trade-off, could further speed-up 
non-iterative methods for Eikonal PDEs on acute meshes. We note that /1/F2 remains the upper bound 
for S and corresponds to the situation when the vehicle is allowed to move only along the directions 
(z'p-x). 

A much harder question is the applicability of label-setting methods to semi-Lagrangian discretizations 
of anisotropic optimal control problems. It is well-known that equations ([23|) are generally not causal; 
this issue is discussed in detail in [28l [34]. On uniform Cartesian grids, the criteria for applicability of 
a Dijkstra-like method to anisotropic problems were previously provided in [24], [19] . and more recently 
in [2]. All of these criteria are grid-orientation dependent; i.e., given a Hamilton- Jacobi PDE, its semi- 
Lagrangian or Eulerian discretization may or may not be computed correctly by a Dijkstra-like method 
depending on whether the anisotropy in the PDE is aligned with the grid directions. Here we provide a 
criterion for applicability of a Dijkstra-like method for discretizations based on arbitrary acute stencils. 
In the anisotropic case, 

-ix,^) = 



Suppose that there exists 5 >Q such that for Va; e ATn il, Vm e M{x), G S„, Vj G {1, . . . , n} 

df f{x,a^) - Sf{x,a^) 



5Ci ' r"(C) 
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By the Theorem 13.21 this imphes that a Dijkstra-hke method wiU be apphcable and a Dial-hke method 
wiU also be apphcable if 5 > 0. Building label-setting methods based on this sufheient condition could 
potentially yield algorithms outperforming the Ordered Upwind Methods specially designed to restore 
the causality of anisotropic problems by dynamically extending the stencil [571 1311 HH] ■ We intend to 
explore this approach in the future work. 

5. Conclusions. We defined a large class of Multimode Stochastic Shortest Path problems and 
derived a number of sufficient conditions to check the applicability of the label-setting methods. We 
illustrated the usefulness of our approach to the numerical analysis of first-order non-linear boundary 
value problems by reinterpreting previous label-setting methods for the Eikonal PDE on Cartesian grids. 
For Eikonal equation on arbitrary meshes, we re-interpreted the prior Dijkstra-like method and derived 
the new formula of bucket-width for Dial- like methods. We also developed a new sufficient condition for 
the applicability of label-setting methods to anisotropic Hamilton-Jacobi PDEs on arbitrary stencils. 

In practice, the applicability of label-setting methods to a particular SSP can be tested directly in 
0{M) operations: upon the method's termination, a single value iteration can be applied and, if it 
results in no changes, the value function was computed correctly. However, the sufficient conditions 
(presented above for MSSPs) allow to avoid these additional computations. 

Unfortunately, the framework of MSSPs is not flexible enough to express many common discrete 
stochastic control problems, where not all possible probability distributions over successor nodes are 
available. Nevertheless, we hope that the key idea of our approach (splitting the original MSSP into 
a number of absolutely causal auxiliary problems) can be generalized to test the applicability of label- 
setting methods to other SSPs. Since SSPs can be naturally extended to describe stochastic games on 
graphs [2U], we also intend to investigate the applicability of our approach to the latter. If successful, 
this will potentially yield efficient numerical methods for a wide class of first and second order static 
Hamilton-Jacobi equations. 

In Dial-like methods, the bucket width can be sometimes adjusted on the fly based on the not-yet- 
accepted part of the problem only. We expect such extensions to be advantageous for problems, where the 
cost function C has very different lower bounds for different nodes. Another open question of practical 
importance is the use of label-setting methods to obtain an approximation of the value function for non- 
causal SSPs. Recently, a numerical method based on a related idea was introduced in [3S] for Eikonal 
PDEs : a Dial-like method is used with buckets of width 5 for a discretization that is not (5-causal. This 
introduces additional errors (analyzed in [22]), but decreases the method's running time. 

Finally, the performance comparison of label-setting and label-correcting methods on MSSPs is a yet 
another interesting topic for the future research. 
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